Data Pipeline Troubleshooting
1 Intro
This document provides documentation for changes to the pipeline made to solve previously found bugs in the pipeline. Its purpose is to help ensure that any future changes to the pipeline to solve newly found bugs do not recreate older bugs (and to limit the extent such changes might create new ones).
Each previously solved problem gets its own section with a brief description of the problem provided in the heading. Subheadings then capture:
- Further details about a given problem
- An example which illustrates the problem for a given record_id/policy_id
- Documentation of all known record_id/policy_ids this problem affected
- The code snippet which resolved the problem
- A check to see if the current version of the dataset matches a screenshot of an older version that has correctly processed the given example as well as the record_id/policy_id which exemplifies the new bug we want to fix
In this version of the markdown, we are testing to make sure that any changes to solve problems reported with the following policy_ids: 4355719 maintain consistency with previously solved issues.
2 Multiple corrections to the same update need to be properly processed
2.1 Details
Previously when there were multiple corrections to the same update, only one of the corrections would overwrite the update, leaving the other correction still hanging out there in the dataset as an unresolved duplicate.
2.2 Example
The 4th and 5th entries in the attached screenshot are both corrections to the update (3rd entry). But only one of them overwrites the update entry while the other one transforms into a new extraneous update during processing .
2.3 Record or policy ids this affected
This problem is known to have affected the following ids:
policy_id<- c(3313228)2.4 Code which resolves this issue
This code can be found in 2b:
kept_corrected_updates <- qualtrics %>%
group_by(correct_record_id) %>%
arrange(desc(recorded_date), .group_by = TRUE) %>%
filter(entry_type %in% 'update') %>%
filter(any(
correct_dum %in% 'correction' &
entry_type %in% 'update' & !is.na(update_id_qurl)
)) %>%
ungroup() %>%
group_by(record_id_overwrite) %>% # This part of the code fixes this problem deletes multiple corrections to the same update
arrange(desc(recorded_date)) %>% # This part of the code fixes this problem deletes multiple corrections to the same update
slice(1) %>% # This part of the code fixes this problem deletes multiple corrections to the same update
ungroup()%>%
group_by(correct_record_id) %>%
filter(ResponseId %!in% update_id_qurl) %>%
ungroup() %>%
group_by(correct_record_id, update_id_qurl) %>%
mutate(index = 1:n()) %>%
filter(case_when(
all(correct_dum == 'original') ~ TRUE,
all(correct_dum == 'correction') &
min(index) ~ TRUE
)) %>%
ungroup() 2.5 Check if code still works
The output of the code below for 3313228 should match the should match the screenshot (taken on June 5, 2021) below insofar as:
- new entry: ’On February 26, 2021, the Governor of Aichi Prefecture (Japan) ŌMURA Hideaki announced that, “to prevent further spread of COVID-19, the Aichi Prefectural Government has declared a critical stage.” Therefore, the Aichi Prefectural Government requested residents to “avoid going outside” after 9pm. The requested period is 14 days, starting from March 1, 2021 through March 14, 2021."
- update 1: ‘On March 10, 2021, the Governor of Aichi Prefecture (Japan) ŌMURA Hideaki announced that the requested period for this measure has been extended until March 21, 2021.’
Note: that the record ids may change if there are future corrections and that ii. there may be additional updates to this policy to come
| record_id | policy_id | correct_record_id | correct_dum | recorded_date | entry_type | update_level | update_type | link_type | description_update | type | date_end_spec | date_end |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| R_1P7Q68URMzMwh1l | 3313228 | 3313228 | correction | 2021-03-22 09:31:28 | new_entry | NA | NA | C | On February 26, 2021, the Governor of Aichi Prefecture (Japan) ŌMURA Hideaki announced that, “to prevent further spread of COVID-19, the Aichi Prefectural Government has declared a critical stage.” Therefore, the Aichi Prefectural Government requested residents to “avoid going outside” after 9pm. The requested period is 14 days, starting from March 1, 2021 through March 14, 2021. | Curfew | The policy has a clear end date | 2021-03-14 |
| R_2YbKsf7DEDLTKMb | 3313228 | 3313228 | correction | 2021-03-24 07:43:30 | update | Strengthening | Change of Policy | C | On February 26, 2021, the Governor of Aichi Prefecture (Japan) ŌMURA Hideaki announced that, “to prevent further spread of COVID-19, the Aichi Prefectural Government has declared a critical stage. Therefore, the Aichi Prefectural Government is requesting residents to “avoid going outside” after 9pm. The requested period is 14 days, starting from March 1, 2021 through March 14, 2021. UPDATE: On March 10, 2021, the Governor of Aichi Prefecture (Japan) ŌMURA Hideaki announced that the requested period for this measure has been extended until March 21, 2021. | Curfew | The policy has a clear end date | 2021-03-21 |
| R_2TBRBurMmjINovu | 4355719 | 4355719 | correction | 2021-04-24 14:37:20 | new_entry | NA | NA | C | Canada, Manitoba, April 17, 2020. The Chief Provincial Public Health Officer announces that “travel to northern Manitoba (north of the 53rd parallel of latitude) is prohibited with some exceptions including: residents of northern and remote communities may continue to move within the north; delivery of goods and services may continue; and exceptions include those who travel to the north for employment, medical treatment or to facilitate child-custody agreements.” | Internal Border Restrictions | The policy has a clear end date | 2020-05-01 |
| R_2thi9iwGIvPneNl | 4355719 | 4355719 | correction | 2021-04-27 14:46:40 | update | Strengthening | Change of Policy | C | Canada, Manitoba, April 17, 2020. The Chief Provincial Public Health Officer announces that “travel to northern Manitoba (north of the 53rd parallel of latitude) is prohibited with some exceptions including: residents of northern and remote communities may continue to move within the north; delivery of goods and services may continue; and exceptions include those who travel to the north for employment, medical treatment or to facilitate child-custody agreements.” UPDATE: Canada, Manitoba, April 29, 2020. Premier Brian Pallister announces that “travel restrictions will remain in place” indefinitely. | Internal Border Restrictions | The policy’s end date is unknown or unreported | NA |
| R_3M0ALZn1gB8tVKi | 4355719 | 4355719 | correction | 2021-04-27 14:47:34 | update | Both Strengthening and Relaxing | Change of Policy | C | Canada, Manitoba, April 17, 2020. The Chief Provincial Public Health Officer announces that “travel to northern Manitoba (north of the 53rd parallel of latitude) is prohibited with some exceptions including: residents of northern and remote communities may continue to move within the north; delivery of goods and services may continue; and exceptions include those who travel to the north for employment, medical treatment or to facilitate child-custody agreements.” UPDATE: Canada, Manitoba, June 1, 2020. The Manitoba government is “allowing direct travel to northern parks, campgrounds, cabins, lodges and resorts while ensuring physical distancing.” | Internal Border Restrictions | The policy’s end date is unknown or unreported | NA |
| R_245BV3nFyWmseJz | 4355719 | 4355719 | correction | 2021-04-27 14:48:38 | update | NA | End of Policy | C | Canada, Manitoba, April 17, 2020. The Chief Provincial Public Health Officer announces that “travel to northern Manitoba (north of the 53rd parallel of latitude) is prohibited with some exceptions including: residents of northern and remote communities may continue to move within the north; delivery of goods and services may continue; and exceptions include those who travel to the north for employment, medical treatment or to facilitate child-custody agreements.” UPDATE: Canada, Manitoba, June 26, 2020. New public health orders “remove restrictions on travel to northern Manitoba and remote communities.” | Internal Border Restrictions | The policy has a clear end date | 2020-06-26 |
3 Keep multiple updates to a new entry (while also making sure to remove multiple corrections to the same update as noted above)
3.1 Details
When there are multiple updates to the same new_entry, it is important to not delete them while at the same time removing multiple corrections to a given update.
3.2 Example
In the example below, the following code correctly keeps all updates to a given new_entry(223262) but fails to delete multiple corrections to a given update (3313228)
kept_corrected_updates_1 <- qualtrics %>%
group_by(correct_record_id) %>%
arrange(desc(recorded_date), .group_by = TRUE) %>%
filter(entry_type %in% 'update') %>%
filter(any(correct_dum %in% 'correction' & entry_type %in% 'update' & !is.na(update_id_qurl))) %>%
filter(record_id %!in% update_id_qurl) %>%
group_by(correct_record_id, update_id_qurl) %>%
slice(1) %>%
ungroup()Conversely the following code incorrectly deletes some updates to a given new_entry (223262) but correctly deletes multiple corrections to a given update (3313228)
kept_corrected_updates <- qualtrics %>%
group_by(correct_record_id) %>%
arrange(desc(recorded_date), .group_by = TRUE) %>%
filter(entry_type %in% 'update') %>%
filter(any(correct_dum %in% 'correction' & entry_type %in% 'update' & !is.na(update_id_qurl))) %>%
filter(record_id %!in% update_id_qurl) %>%
ungroup() 3.3 Record or policy ids this affected
This problem is known to have affected the following ids:
3.4 Solution
The following code both keeps multiple updates to the same new_entry while also deleting multipe corrections to the same update
kept_corrected_updates <- qualtrics %>%
group_by(correct_record_id) %>%
arrange(desc(recorded_date), .group_by = TRUE) %>%
filter(entry_type %in% 'update') %>%
filter(any(
correct_dum %in% 'correction' &
entry_type %in% 'update' & !is.na(update_id_qurl)
)) %>%
ungroup() %>%
group_by(record_id_overwrite) %>% # This part of the code fixes this problem deletes multiple corrections to the same update
arrange(desc(recorded_date)) %>% # This part of the code fixes this problem deletes multiple corrections to the same update
slice(1) %>% # This part of the code fixes this problem deletes multiple corrections to the same update
ungroup()%>%
group_by(correct_record_id) %>%
filter(ResponseId %!in% update_id_qurl) %>%
ungroup() %>%
group_by(correct_record_id, update_id_qurl) %>% # This part of the code makes sure we keep multiple updates to the same new entry
mutate(index = 1:n()) %>% # This part of the code makes sure we keep multiple updates to the same new entry
filter(case_when( # This part of the code makes sure we keep multiple updates to the same new entry
all(correct_dum == 'original') ~ TRUE, # This part of the code makes sure we keep multiple updates to the same new entry
all(correct_dum == 'correction') & # This part of the code makes sure we keep multiple updates to the same new entry
min(index) ~ TRUE # This part of the code makes sure we keep multiple updates to the same new entry
)) %>%
ungroup() 3.5 Check if code still works
The output of the code below for 2223262 should match the screenshot (taken on June 5, 2021) below insofar as the policy history in the description should include the following ( i. note that the record ids may change if there are future corrections and that ii. there may be additional updates to this policy to come)
- new entry: “Canada, Manitoba, March 23, 2020. Public health officials “are recommending that anyone who returns from [domestic] travel … should self-isolate … for 14 days following their return. This recommendation does not include: the commercial transportation of goods and services; workers, including health-care workers who live in a neighbouring jurisdiction and travel to Manitoba for work; or normal personal travel in border communities including visits to a cottage”
- update 1: “Canada, Manitoba, April 17, 2020. “The chief provincial public health officer has updated public health orders that take effect on April 17, and will be in effect until May 1, 2020. They mandate that anyone entering Manitoba, regardless of whether it was from another country or another province must self-isolate for 14 days.”"
- update 2: “Canada, Manitoba, April 29, 2020. “Requirements for self-isolation for 14 days following travel will continue” indefinitely."
- update 3: “Canada, Manitoba, June 21, 2020. Prime Minister Brian Pallister announces that Manitoba will be “allowing people from British Columbia, Alberta, Saskatchewan, Yukon, Northwest Territories and Nunavut, and people living in the area of the northwestern Ontario (west of Terrace Bay) to visit Manitoba without having to self-isolate for 14 days”
The output of the code below for 3313228 should match the should match the screenshot (taken on June 5, 2021) below insofar as ( i. note that the record ids may change if there are future corrections and that ii. there may be additional updates to this policy to come):
- new entry: ’On February 26, 2021, the Governor of Aichi Prefecture (Japan) ŌMURA Hideaki announced that, “to prevent further spread of COVID-19, the Aichi Prefectural Government has declared a critical stage.” Therefore, the Aichi Prefectural Government requested residents to “avoid going outside” after 9pm. The requested period is 14 days, starting from March 1, 2021 through March 14, 2021."
- update 1: ‘On March 10, 2021, the Governor of Aichi Prefecture (Japan) ŌMURA Hideaki announced that the requested period for this measure has been extended until March 21, 2021.’
ra_data_pull_purified_all_sample %>%
filter(policy_id %in% c( 2223262,
3313228,
test_policy_ids
)) %>%
arrange(policy_id) %>%
select(record_id, policy_id,correct_record_id, correct_dum, recorded_date, entry_type, update_level, update_type, link_type, description_update, type, date_end_spec, date_end)%>%
kbl(caption = "Check whether multiple updates to the same new_entry are kept") %>%
kable_styling(c("striped", "hover"), full_width = F, font_size = 11) %>%
column_spec(8, width_min = '3in') | record_id | policy_id | correct_record_id | correct_dum | recorded_date | entry_type | update_level | update_type | link_type | description_update | type | date_end_spec | date_end |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| R_D0p59F1e8fAtVxn | 2223262 | 2223262 | correction | 2021-04-08 08:59:20 | new_entry | NA | NA | C | Canada, Manitoba, March 23, 2020. Public health officials “are recommending that anyone who returns from [domestic] travel … should self-isolate … for 14 days following their return. This recommendation does not include: the commercial transportation of goods and services; workers, including health-care workers who live in a neighbouring jurisdiction and travel to Manitoba for work; or normal personal travel in border communities including visits to a cottage.” | Quarantine | The policy’s end date is unknown or unreported | NA |
| R_1NeH5ErH65NMykJ | 2223262 | 2223262 | correction | 2021-04-08 09:01:40 | update | Strengthening | Change of Policy | C | Canada, Manitoba, March 23, 2020. Public health officials “are recommending that anyone who returns from [domestic] travel … should self-isolate … for 14 days following their return. This recommendation does not include: the commercial transportation of goods and services; workers, including health-care workers who live in a neighbouring jurisdiction and travel to Manitoba for work; or normal personal travel in border communities including visits to a cottage.” UPDATE: Canada, Manitoba, April 17, 2020. “The chief provincial public health officer has updated public health orders that take effect on April 17, and will be in effect until May 1, 2020. They mandate that anyone entering Manitoba, regardless of whether it was from another country or another province must self-isolate for 14 days.” | Quarantine | The policy has a clear end date | 2021-05-01 |
| R_V57SZuHi0dB95Rv | 2223262 | 2223262 | correction | 2021-05-03 15:02:29 | update | Strengthening | Change of Policy | C | Canada, Manitoba, March 23, 2020. Public health officials “are recommending that anyone who returns from [domestic] travel … should self-isolate … for 14 days following their return. This recommendation does not include: the commercial transportation of goods and services; workers, including health-care workers who live in a neighbouring jurisdiction and travel to Manitoba for work; or normal personal travel in border communities including visits to a cottage.” UPDATE: Canada, Manitoba, April 29, 2020. “Requirements for self-isolation for 14 days following travel will continue” indefinitely. | Quarantine | The policy’s end date is unknown or unreported | NA |
| R_XM6ZW1b4VH5TGGB | 2223262 | 2223262 | correction | 2021-04-08 09:03:40 | update | NA | End of Policy | C | Canada, Manitoba, March 23, 2020. Public health officials “are recommending that anyone who returns from [domestic] travel … should self-isolate … for 14 days following their return. This recommendation does not include: the commercial transportation of goods and services; workers, including health-care workers who live in a neighbouring jurisdiction and travel to Manitoba for work; or normal personal travel in border communities including visits to a cottage.” UPDATE: Canada, Manitoba, June 21, 2020. Prime Minister Brian Pallister announces that Manitoba will be “allowing people from British Columbia, Alberta, Saskatchewan, Yukon, Northwest Territories and Nunavut, and people living in the area of the northwestern Ontario (west of Terrace Bay) to visit Manitoba without having to self-isolate for 14 days.” | Quarantine | The policy has a clear end date | 2020-06-21 |
| R_1P7Q68URMzMwh1l | 3313228 | 3313228 | correction | 2021-03-22 09:31:28 | new_entry | NA | NA | C | On February 26, 2021, the Governor of Aichi Prefecture (Japan) ŌMURA Hideaki announced that, “to prevent further spread of COVID-19, the Aichi Prefectural Government has declared a critical stage.” Therefore, the Aichi Prefectural Government requested residents to “avoid going outside” after 9pm. The requested period is 14 days, starting from March 1, 2021 through March 14, 2021. | Curfew | The policy has a clear end date | 2021-03-14 |
| R_2YbKsf7DEDLTKMb | 3313228 | 3313228 | correction | 2021-03-24 07:43:30 | update | Strengthening | Change of Policy | C | On February 26, 2021, the Governor of Aichi Prefecture (Japan) ŌMURA Hideaki announced that, “to prevent further spread of COVID-19, the Aichi Prefectural Government has declared a critical stage. Therefore, the Aichi Prefectural Government is requesting residents to “avoid going outside” after 9pm. The requested period is 14 days, starting from March 1, 2021 through March 14, 2021. UPDATE: On March 10, 2021, the Governor of Aichi Prefecture (Japan) ŌMURA Hideaki announced that the requested period for this measure has been extended until March 21, 2021. | Curfew | The policy has a clear end date | 2021-03-21 |
| R_2TBRBurMmjINovu | 4355719 | 4355719 | correction | 2021-04-24 14:37:20 | new_entry | NA | NA | C | Canada, Manitoba, April 17, 2020. The Chief Provincial Public Health Officer announces that “travel to northern Manitoba (north of the 53rd parallel of latitude) is prohibited with some exceptions including: residents of northern and remote communities may continue to move within the north; delivery of goods and services may continue; and exceptions include those who travel to the north for employment, medical treatment or to facilitate child-custody agreements.” | Internal Border Restrictions | The policy has a clear end date | 2020-05-01 |
| R_2thi9iwGIvPneNl | 4355719 | 4355719 | correction | 2021-04-27 14:46:40 | update | Strengthening | Change of Policy | C | Canada, Manitoba, April 17, 2020. The Chief Provincial Public Health Officer announces that “travel to northern Manitoba (north of the 53rd parallel of latitude) is prohibited with some exceptions including: residents of northern and remote communities may continue to move within the north; delivery of goods and services may continue; and exceptions include those who travel to the north for employment, medical treatment or to facilitate child-custody agreements.” UPDATE: Canada, Manitoba, April 29, 2020. Premier Brian Pallister announces that “travel restrictions will remain in place” indefinitely. | Internal Border Restrictions | The policy’s end date is unknown or unreported | NA |
| R_3M0ALZn1gB8tVKi | 4355719 | 4355719 | correction | 2021-04-27 14:47:34 | update | Both Strengthening and Relaxing | Change of Policy | C | Canada, Manitoba, April 17, 2020. The Chief Provincial Public Health Officer announces that “travel to northern Manitoba (north of the 53rd parallel of latitude) is prohibited with some exceptions including: residents of northern and remote communities may continue to move within the north; delivery of goods and services may continue; and exceptions include those who travel to the north for employment, medical treatment or to facilitate child-custody agreements.” UPDATE: Canada, Manitoba, June 1, 2020. The Manitoba government is “allowing direct travel to northern parks, campgrounds, cabins, lodges and resorts while ensuring physical distancing.” | Internal Border Restrictions | The policy’s end date is unknown or unreported | NA |
| R_245BV3nFyWmseJz | 4355719 | 4355719 | correction | 2021-04-27 14:48:38 | update | NA | End of Policy | C | Canada, Manitoba, April 17, 2020. The Chief Provincial Public Health Officer announces that “travel to northern Manitoba (north of the 53rd parallel of latitude) is prohibited with some exceptions including: residents of northern and remote communities may continue to move within the north; delivery of goods and services may continue; and exceptions include those who travel to the north for employment, medical treatment or to facilitate child-custody agreements.” UPDATE: Canada, Manitoba, June 26, 2020. New public health orders “remove restrictions on travel to northern Manitoba and remote communities.” | Internal Border Restrictions | The policy has a clear end date | 2020-06-26 |
3.6 Details
Corrections RAs make to a new_entry should percolate down to teh whole policy thread; previously it didn’t
3.7 Example
In the example below, the new entry was originally coded as ‘External Border Restrictions’ but was then corrected to be a ‘Quarantine’. However, the update for this new_entry remained ‘External Border Restrictions’
## Record or policy ids this affected
This problem is known to have affected the following ids:
3.8 Solution
qualtrics_with_dateendspec_filled <- qualtrics_with_dateendspec %>%
mutate_if(is.character, list(~na_if(., ""))) %>%
group_by(record_id_overwrite) %>%
arrange(policy_id, record_id_overwrite, recorded_date) %>%
fill(all_of(miss_vars), .direction = "down") %>%
ungroup() %>%
group_by(policy_id, entry_type) %>%
arrange(policy_id, entry_type, recorded_date) %>%
fill(all_of(miss_vars_noend), .direction = "down") %>%
ungroup()%>%
group_by(policy_id) %>%
arrange(policy_id, entry_type, recorded_date) %>% # arranginging by entry_type solves this issue
fill(all_of(miss_vars_noend[-which(grepl('update', miss_vars_noend))]), .direction = "down") %>%
ungroup()
qualtrics_without_dateendspec_filled <- qualtrics_without_dateendspec %>%
mutate_if(is.character, list(~na_if(., ""))) %>%
group_by(record_id_overwrite) %>%
arrange(policy_id, record_id_overwrite, recorded_date) %>%
fill(all_of(miss_vars), .direction = "down") %>%
ungroup() %>%
group_by(policy_id, entry_type) %>%
arrange(policy_id, entry_type, recorded_date) %>%
fill(all_of(miss_vars), .direction = "down") %>%
ungroup()%>%
group_by(policy_id) %>%
arrange(policy_id, entry_type, recorded_date) %>% # arranginging by entry_type solves this issue
fill(all_of(miss_vars[-which(grepl('update', miss_vars))]), .direction = "down") %>%
ungroup()3.9 Check code still works
- The output of the code below should match the screenshot for policy_id 1620615 insofar as the ‘type’ of policy for new_entry and all subsequent updates should be ‘Quarantine’
ra_data_pull_purified_all_sample %>%
filter(policy_id %in% c( 1620615,
test_policy_ids
)) %>%
arrange(policy_id) %>%
select(record_id, policy_id,correct_record_id, correct_dum, recorded_date, entry_type, update_level, update_type, link_type, type, date_end_spec, date_end) %>%
kbl(caption = "Check whether corrections to new entries flow through to updates") %>%
kable_styling(c("striped", "hover"), full_width = F, font_size = 11) %>%
column_spec(8, width_min = '3in') | record_id | policy_id | correct_record_id | correct_dum | recorded_date | entry_type | update_level | update_type | link_type | type | date_end_spec | date_end |
|---|---|---|---|---|---|---|---|---|---|---|---|
| R_UDEzXMilbVfhDyh | 1620615 | 1620615 | correction | 2021-04-14 20:50:12 | new_entry | NA | NA | C | Quarantine | The policy’s end date is unknown or unreported | NA |
| R_33dK64Xv0KzYWMM | 1620615 | 1620615 | correction | 2021-04-14 21:02:54 | update | NA | End of Policy | C | Quarantine | The policy has a clear end date | 2020-08-07 |
| R_2TBRBurMmjINovu | 4355719 | 4355719 | correction | 2021-04-24 14:37:20 | new_entry | NA | NA | C | Internal Border Restrictions | The policy has a clear end date | 2020-05-01 |
| R_2thi9iwGIvPneNl | 4355719 | 4355719 | correction | 2021-04-27 14:46:40 | update | Strengthening | Change of Policy | C | Internal Border Restrictions | The policy’s end date is unknown or unreported | NA |
| R_3M0ALZn1gB8tVKi | 4355719 | 4355719 | correction | 2021-04-27 14:47:34 | update | Both Strengthening and Relaxing | Change of Policy | C | Internal Border Restrictions | The policy’s end date is unknown or unreported | NA |
| R_245BV3nFyWmseJz | 4355719 | 4355719 | correction | 2021-04-27 14:48:38 | update | NA | End of Policy | C | Internal Border Restrictions | The policy has a clear end date | 2020-06-26 |
NOTE: there is more than one type of policy for a given tested record, there is a problem with the code
ra_data_pull_purified_all_sample %>%
filter(policy_id %in% c( 1620615,
test_policy_ids
)) %>%
group_by(policy_id) %>%
summarise(unique_types = length(unique(type)))%>%
ungroup## # A tibble: 2 x 2
## policy_id unique_types
## * <chr> <int>
## 1 1620615 1
## 2 4355719 1